-
Notifications
You must be signed in to change notification settings - Fork 140
Flatbuffers impl #446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flatbuffers impl #446
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust Benchmark
Benchmark suite | Current: 537ab54 | Previous: 67b7b70 | Ratio |
---|---|---|---|
rule-match-browserlike/brave-list |
2150693604 ns/iter (± 16942629 ) |
1803081237 ns/iter (± 16195247 ) |
1.19 |
rule-match-first-request/brave-list |
1017455 ns/iter (± 36469 ) |
980016 ns/iter (± 5452 ) |
1.04 |
blocker_new/brave-list |
158380700 ns/iter (± 1023726 ) |
203961371 ns/iter (± 2004867 ) |
0.78 |
memory-usage/brave-list-initial |
21536659 ns/iter (± 3 ) |
41762172 ns/iter (± 3 ) |
0.52 |
memory-usage/brave-list-after-1000-requests |
24141128 ns/iter (± 3 ) |
44355700 ns/iter (± 3 ) |
0.54 |
This comment was automatically generated by workflow using github-action-benchmark.
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that from_raw_parts
results in issues if bytes.as_ptr()
isn't aligned to 2 bytes We need to assert this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert added
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
src/flat_network_filter_list.rs
Outdated
} | ||
|
||
let filters_list = | ||
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@boocmp use root_as_network_filter_list()
+ .expect()
to remove unsafe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it runs a verification process, it'll degrade performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we build the flatbuffer just before using root_as_network_filter_list_unchecked
should be fine in terms of security.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this check on deserialization
unsafe { | ||
self._tab | ||
.get::<flatbuffers::ForwardsUOffset<&str>>(NetworkFilter::VT_RAW_LINE, None) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
src/blocker.rs
Outdated
let mut disabled_directives: HashSet<String> = HashSet::new(); | ||
let mut enabled_directives: HashSet<String> = HashSet::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not &str
anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returned &str
back
if self.filter_map.is_empty() { | ||
return None; | ||
} | ||
|
||
let filters_list = | ||
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
|
||
if self.filter_map.is_empty() { | ||
return filters; | ||
} | ||
|
||
let filters_list = | ||
unsafe { fb::root_as_network_filter_list_unchecked(&self.flatbuffer_memory) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
0238db7
to
a40fa5e
Compare
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
bytes.as_ptr() as *const u16, | ||
bytes.len() / std::mem::size_of::<u16>(), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
index: u32, | ||
owner: &'a NetworkFilterList, | ||
) -> Self { | ||
let list_address: *const NetworkFilterList = owner as *const NetworkFilterList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pointer casting often indicates some problematic design patterns in Rust. I think we can remove it in this instance?
- this
owner
field can increase memory usage since it's an additional address-sized field on each filter, even though they'd all have the same value for all filters in a list - I only see it used in
NetworkMatchable
to make it conform to the existing API. I'm not opposed to that API changing if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FlatNetworkFilter
is a wrapper over the fb::NetworkFilter
. We only create the instance of it when found the the fb::NetworkFilter
in the NetworkFilterList
so it does't increase the memory usage because it temporary object on the stack. For now, regex manager is designed to use unique key to find corresponding regex, previously it was an address of the NetworkFilter
, but now we don't have unique address for every filter because they stored in flatbuffer. To create unique key we use the list address combined with the filter's index. I agree it looks ugly but I don't know how to do it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's understand why we need this unique key
.
If it's only for regex manager, let's:
- try to move key calculation here
- add TODO to make keys unique only in scope of regex manager. I believe once that in one of the next steps we move from multiple flatbuffers to a just one that ables to contain multiple lists. That way we get 1:1 proportion (regex manager: flatbuffer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this could be done in follow up: this PR is already waiting for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 regex manager: 1 flatbuffer
definitely sounds like the right way to do it, but agreed it can be followup if needed
src/network_filter_list.rs
Outdated
#[derive(Serialize, Deserialize)] | ||
pub struct NetworkFilterList { | ||
pub(crate) flatbuffer_memory: Vec<u8>, | ||
pub(crate) filter_map: HashMap<Hash, Vec<u32>>, | ||
pub(crate) unique_domains_hashes_map: HashMap<Hash, u16>, | ||
} | ||
|
||
impl Default for NetworkFilterList { | ||
fn default() -> Self { | ||
Self { | ||
flatbuffer_memory: Default::default(), | ||
filter_map: Default::default(), | ||
unique_domains_hashes_map: Default::default(), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Serialize, Deserialize, Default)]
should automatically produce this Default
implementation already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason it doesn't, I think because of the new version of serde
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh,,god, I forgot to add Default
it the derive list... I'll fix it in one of the follow ups
Added raw_line field to flatbuffers for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other thing left unaddressed is #446 (comment), still not sure if it's required
src/data_format/mod.rs
Outdated
|
||
use crate::blocker::Blocker; | ||
use crate::cosmetic_filter_cache::CosmeticFilterCache; | ||
|
||
use crate::blocker::Blocker; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but I don't understand the reason to move Blocker
into a separate paragraph by itself (similar change also in src/data_format/v0.rs
)
[puLL-Merge] - brave/adblock-rust@446 DescriptionThis PR changes how serialization works in the adblock-rust library by implementing a FlatBuffers-based storage for network filters. The main motivations are to improve performance and reduce memory usage by replacing the old serialization system with a more efficient FlatBuffers approach. The PR also reorganizes the benchmarking structure by moving serialization benchmarks to a separate file. ChangesChanges
sequenceDiagram
participant App as Application
participant Engine as Engine
participant Serializer as Serializer
participant FB as FlatBuffers
participant NetworkList as NetworkFilterList
App->>Engine: serialize()
Engine->>Serializer: SerializeFormat::build()
Serializer->>FB: create FlatBuffersBuilder
loop For each network filter
FB->>FB: add filter to buffer
FB->>NetworkList: track domain hash indices
end
FB->>Serializer: finish buffer
Serializer->>Engine: return serialized data
Engine->>App: return Vec<u8>
App->>Engine: deserialize(data)
Engine->>FB: parse FlatBuffer data
FB->>NetworkList: rebuild filter map
FB->>NetworkList: rebuild domain hash maps
NetworkList->>Engine: return deserialized structures
Engine->>App: Ok(())
Possible Issues
|
NetworkFilterList
implementation is now based onFlatBuffers
.NetworkFilterList
creation has been removed. Since rebuilding the entireFlatBuffer
is costly, this functionality may be reintroduced in the future if there is a clear need for it.